Comparative Analysis of Principal Components Can be Misleading.
نویسندگان
چکیده
Most existing methods for modeling trait evolution are univariate, although researchers are often interested in investigating evolutionary patterns and processes across multiple traits. Principal components analysis (PCA) is commonly used to reduce the dimensionality of multivariate data so that univariate trait models can be fit to individual principal components. The problem with using standard PCA on phylogenetically structured data has been previously pointed out yet it continues to be widely used in the literature. Here we demonstrate precisely how using standard PCA can mislead inferences: The first few principal components of traits evolved under constant-rate multivariate Brownian motion will appear to have evolved via an "early burst" process. A phylogenetic PCA (pPCA) has been proprosed to alleviate these issues. However, when the true model of trait evolution deviates from the model assumed in the calculation of the pPCA axes, we find that the use of pPCA suffers from similar artifacts as standard PCA. We show that data sets with high effective dimensionality are particularly likely to lead to erroneous inferences. Ultimately, all of the problems we report stem from the same underlying issue--by considering only the first few principal components as univariate traits, we are effectively examining a biased sample of a multivariate pattern. These results highlight the need for truly multivariate phylogenetic comparative methods. As these methods are still being developed, we discuss potential alternative strategies for using and interpreting models fit to univariate axes of multivariate data.
منابع مشابه
Running head: PCA IN COMPARATIVE ANALYSES Comparative analysis of principal components can be misleading
Most existingmethods formodeling trait evolution are univariate, while researchers are oen interested in investigating evolutionary patterns and processes across multiple traits. Principal components analysis (PCA) is commonly used to reduce the dimensionality of multivariate data as univariate trait models can be t to the individual principal components. e 15 problem with using standard PCA ...
متن کاملOn the Construct Validity of the Reading Section of the University of Tehran English Proficiency Test
University of Tehran administers a test known as The University of Tehran English Proficiency Test (the UTEPT) to PhD candidates on a yearly basis. By definition, the test can be considered a high-stakes one. The validity of high stakes tests needs to be known (Roever, 2001). As Mesick (1988) maintains, if the validity of high stakes tests are not known, it might have some undesirable consequen...
متن کاملA comparative study on the morphometric and meristic characteristics study of Alosa caspia (Eichwald, 1838) populations in the southern Caspian Sea the basin
The present study was conducted on the comparison of morphometric and meristic factors among Alosa caspia (Eichwald, 1838) populations in the southern Caspian Sea basin. A total of 285 Alosa caspia specimens were caught from three localities, from west to the east of Caspian Sea including Guilan (Bandar Anzali), Mazandaran (Sari) and Golestan (Miankale) provinces. Thirty morphometric and ten me...
متن کاملGeneral practitioners\' views on key factors affecting their desired income: A principal component analysis approach
Background: Based on the target income hypothesis, the economic behavior of physicians is mainly affected by their target income. This study aimed at designing an instrument to explain how general practitioners (GPs) set their desired income. Methods: A self-administered questionnaire of affecting factors on GPs' target income was extracted from literature reviews and a small qual...
متن کاملFunctional Analysis of Iranian Temperature and Precipitation by Using Functional Principal Components Analysis
Extended Abstract. When data are in the form of continuous functions, they may challenge classical methods of data analysis based on arguments in finite dimensional spaces, and therefore need theoretical justification. Infinite dimensionality of spaces that data belong to, leads to major statistical methodologies and new insights for analyzing them, which is called functional data analysis (FDA...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systematic biology
دوره 64 4 شماره
صفحات -
تاریخ انتشار 2015